You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Replace Tokens (Text Processing)
Synopsis
Replaces all occurrences of all specified regular expression within each token by its specified replacement.Description
This operator allows replacing of substrings within each token. Therefore the user might specify arbitrary what/by pairs in the replace_dictionary parameter. The left column of the table specifies what should be replaced and the right side the replacement. Since replacement is not performed over boundaries of tokens, this operator is best placed directly after the TextInput operator or at least before the tokenizer. To specify what should be replaced, regular expressions might be used.
Please remember that several characters have special meanings and might have to be quoted if used as the character itself. The replacement can be defined as an arbitrary string. Capturing groups of the defined regular expression can be accessed with $1, $2, $3... Empty strings are not allowed, but since a following tokenizer will simply skip additional blanks, feel free to replace with a blank.
Input
- document
The document port.
Output
- document
The document port.
Parameters
- replace_dictionaryDefines the replacements. Range: